knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(tidyverse)
library(dplyr)
library(sf)
library(sp)
library(tidycensus)
library(RSocrata)
library(spdep)
library(spatstat)
library(RColorBrewer)
library(viridis)
library(classInt)
library(reshape2)
library(FNN)
library(tm)
library(geojsonsf)
library(stats)
library(patchwork)
library(ggthemes)
library(gridExtra)
library(tmap)
library(stats)
library(patchwork)
library(ggcorrplot)
library(cowplot)
library(GWmodel)
library(gtable)
library(grid)
library(spgwr)
library(expss)
library(here)
library(janitor)
library(raster)
library(rosm)
library(leaflet)
library(rgdal)
library(mgcv)
options(scipen=999)The “Opioid epidemic” is a global health crisis (Krausz et al., 2021). The effectiveness of this drug class in pain treatment, modern anaesthesia and palliative care (Rosenblum et al., 2008, Rosner et al., 2019) is widely regarded in the medical world. Unfortunately, the substance is highly addictive (Waldhoer et al. 2004). Among the many side effects of opioid usage, such as digestive disorders, constipation and depression (Adams, 1889), the most notable is respiratory depression, which is the main cause of opioid overdose related deaths (Montandon and Slutsky, 2019). In the United States, there has been widespread over-prescription of opioids such as OxyContin since the 1990s (Lyden and Binswanger, 2019). This has led to increased dependency on synthetic opioids. Opioid overdose-related hospitalisations increased 64% between 2005 and 2014, and in 2016 over 42,000 Americans died from opioid overdoses (Lyden and Binswanger, 2019). Governments on the local and national levels of the country are actively pursuing solutions to mitigate the risks that this crisis poses to society.
There has been a great deal of research undertaken to understand the socio-economic and environmental factors exacerbating opioid overdoses and related deaths. Socio-economic marginalisation is an important contributor to opioid overdoses amongst the people using them (Draanen et al., 2020). Several studies have found opioid abuse to be more common amongst people recently released from prison (Brinkley-Rubinstein et al., 2018), those living in poverty (Marshall et al., 2019) and the unemployed (Hollingsworth et al., 2017), whereas higher measures of educational achievement (Marshall et al. 2019) and social support in the form of healthy relationships (Burns et al., 2004) have been found to be negatively associated with overdose cases. All of these factors point to the increasing need for policy to uplifting users of opioids with social and economic insecurity (Draanen et al., 2020) and using these associated risk factors to predict potential areas for future overdose prevalence.
While many of the mentioned studies focused on linear regression models and multivariate analysis to get to these conclusions, they often failed to take into consideration the spatial relationship that opioid overdoses have with these variables. Studies that have accounted for spatial variability have been able to highlight high levels of clustering facilitating the identification of areas which require immediate policy attention. For example, a study in Cincinnati, Ohio, not only found hotspots of heroin-related incidents via spatial clustering, but what variables were common amongst them (Choi et al., 2022). These hotspots were often found in areas with lower education rates, higher poverty rates and higher male percentages, consolidating some of the key findings in existing literature. A previous study in Cincinnati implemented spatial regression methods to factor in characteristics of the built environment as well as socio-economic variables, something that may have been more difficult to accomplish without the implementation of location data and spatial analysis (Li et al. 2019). While the proportion of parks, manufacturing districts and commercial sectors was positively associated with heroin-related emergency calls, there was a negative association seen with proximity to fast food restaurants, hospitals and opioid-treatment programs. Overall, advancements in spatial analytics have significantly progressed research in this field, with the aim to direct change in areas most needed and identify potential characteristics of areas that are most susceptible.
Adding to the previous work in this field, we aim to spatially characterise the risk factors associated with overdoses in the City of Mesa, located in the state of Arizona. Mesa is an ideal study site as the Arizona Department of Health Services has stated that every day over 5 people die from opioid overdoses within the state (Opioid Prevention, 2022). The governor of Arizona himself, Doug Ducey, has stated clearly how Arizona, and the nation as a whole is facing a public health emergency which requires targeted solutions to alleviate pain (Office of Governor Ducey, 2017). The City of Mesa publishes publicly available data on overdose incidents, providing an opportunity to determine the key characteristics of overdose hotspots which will inform future policymaking.
The objective of our research is to improve the targeting of interventions against heroin overdoses in Mesa. We aim to identify how demographic, socioeconomic, and geographic factors are associated with overdose deaths in order to inform the City about where to take proactive measures to prevent more incidences. To achieve this, we model the relationship between various potential risk factors and the number of overdoses using spatial regression.
We hypothesize that there will be higher overdose mortality in areas with greater minority and low-income populations. Our predictions are based on studies showing that heroin-related incidents were negatively correlated with median household income in Cincinnati (Li et al., 2019) and that incidents were growing most rapidly amongst Black and Latinx communities across US metropolitan areas (Lippold et al., 2019). We also predict that characteristics of the built environment such as distance from greenspace, zoning assignments, or indications of plight based on the aforementioned study (Li et al., 2019). Finally, we expect a lower number of incidents with regions of lower crime rates in accordance to results in similar work (Choi et al., 2022).
maricopa_crs <- 'EPSG:2223'
census_api_key("0f5f47c1f6197eb4d9a922410ed9dfb8af738b50", overwrite = TRUE)
#Boundary/outline of mesa
mesa_boundary <- st_as_sf(st_read("../City_Boundary.csv"), wkt = 'Geometry', crs = 4326, agr = 'constant')%>%st_transform(maricopa_crs)
#------------------------------------ Census Data ----------------------------------------------#
#Census Block group data: geometries + sociodemographic data
variables_of_interest2 <- c("pop" = "B01003_001",
"white_pop" = "B03002_003",
"black_pop" = "B03002_004",
"hispanic_pop" = "B03002_012",
"median_income" = "B19013_001",
"unemployed" = "B23025_005",
"total_labor_force" = "B23025_003",
"owner_occupied_homes" = "B25003_002",
"renter_occupied_homes" = "B25003_003",
"vacant_homes" = "B25002_003",
"total_homes" = "B25002_001"
)
mesa_cb_groups <- get_acs(geography = "block group", variables = variables_of_interest2,
year=2020, survey = "acs5", state=04, county=013, geometry=TRUE) %>% st_transform(maricopa_crs)
out_of_bounds <- c('040139413004', '040139413002', '040130101021', '040138169001', '040139413003', '1040133184002')
mesa_cb_groups<- mesa_cb_groups[ ! mesa_cb_groups$GEOID %in% out_of_bounds, ]
mesa_cb_groups <- mesa_cb_groups[mesa_boundary,]
mesa_cb_table <- mesa_cb_groups %>%
dplyr::select( -NAME, -moe) %>% spread(variable, estimate) %>%
mutate(area = st_area(geometry),
perc_white = white_pop/pop*100,
perc_black = black_pop/pop*100,
perc_hispanic = hispanic_pop/pop*100,
perc_unemployed = unemployed/total_labor_force*100,
perc_owners = owner_occupied_homes/total_homes*100,
perc_renters = renter_occupied_homes/total_homes*100,
perc_vacant = vacant_homes/total_homes*100)
mesa_cb_table$perc_vacant[mesa_cb_table$perc_vacant >100] <- 100
mesa_cb_table <- na.omit(mesa_cb_table)
#---------------------------------- Overdose Dataset ------------------------------------------#
overdoses <- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/qufy-tzv6.json")),
coords = c("longitude", "latitude"), crs = 4326, agr = "constant") %>%
st_transform(st_crs(maricopa_crs))
overdoses18_20 <- overdoses[overdoses$year == 2018 | overdoses$year == 2019 | overdoses$year == 2020, ]
overdose_intersect <- overdoses18_20 %>% st_join(., mesa_cb_table, join=st_within) %>%
group_by(GEOID) %>% summarize(count = n()) %>% st_drop_geometry()
colnames(overdose_intersect)[2] <- "overdose_count"
mesa_cb_table <- left_join(mesa_cb_table, overdose_intersect, by="GEOID")
mesa_cb_table$overdose_count[is.na(mesa_cb_table$overdose_count)] <- 0
#------------------------------------- Zoning Dataset --------------------------------------------#
zoning_parcels <- st_read("../Zoning Districts.geojson")%>%st_transform(st_crs(maricopa_crs))
zones <- c("resid_high", "resid_low", "CBD", "commercial")
zoning_parcels$zoning[str_detect(zoning_parcels$zoning, "^RM")] <- "resid_high"
zoning_parcels$zoning[str_detect(zoning_parcels$zoning, "^RS")] <- "resid_low"
zoning_parcels$zoning[str_detect(zoning_parcels$zoning, "^D")] <- "CBD"
commercial_list <- c("GC","LC","NC","OC")
zoning_parcels$zoning[zoning_parcels$zoning %in%commercial_list] <- "commercial"
zoning_parcels <- zoning_parcels[zoning_parcels$zoning %in% zones, ]
zoning_parcels <- zoning_parcels[c("zoning","geometry")]
mesa_cb_sub <- subset(mesa_cb_table, select = c(GEOID, geometry, area))
zoning_intersection <- st_intersection(mesa_cb_sub, st_make_valid(zoning_parcels))
zoning_intersection$inter_area <- st_area(zoning_intersection$geometry)
zoning_intersection <- aggregate(zoning_intersection$inter_area,
by=list(zoning_intersection$GEOID,zoning_intersection$zoning), FUN=sum)
zoning_intersection<-dcast(zoning_intersection, Group.1~Group.2)
zoning_intersection[is.na(zoning_intersection)] <- 0
colnames(zoning_intersection)[1] <- "GEOID"
mesa_cb_table <- left_join(mesa_cb_table, zoning_intersection,by="GEOID")
mesa_cb_table[is.na(mesa_cb_table)] <- 0
mesa_cb_table <- mesa_cb_table %>% mutate(
cbd_per = as.numeric(CBD/area*100),
resid_high_per = as.numeric(resid_high/area*100),
resid_low_per = as.numeric(resid_low/area*100),
commercial_per = as.numeric(commercial/area*100))
columns_to_keep <- c("GEOID", "pop", "area", "overdose_count", "median_income", "perc_white", "perc_black", "perc_hispanic",
"perc_unemployed", "perc_owners", "perc_renters", "perc_vacant",
"cbd_per", "resid_high_per", "resid_low_per", "commercial_per", "geometry")
mesa_cb_updated <- mesa_cb_table[columns_to_keep]
#------------------------------ City Property + Graffiti Dataset ------------------------------------------#
city_prop<- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/xms2-ya86.json")),
coords = c("longitude", "latitude"), crs = 4326, agr = "constant")%>%st_transform(st_crs(maricopa_crs))
city_prop <- city_prop %>% distinct(address, .keep_all = TRUE)
# public safety + greenspace + vacant property + community dev + graffiti
safety <- city_prop%>%filter(property_use %in% c("Public Safety--Fire/Police", "Park/Public Safety"))%>%
dplyr::select(property_use, geometry)
safety$property_use <- "public_safety"
green<- city_prop%>%filter(property_use %in% c("Parks", "Pocket Park", "Cemetery"))%>%
dplyr::select(property_use, geometry)
green$property_use <- "public_reenspace"
vacant_city<- city_prop%>%filter(property_use %in% c("Vacant", "Vacant (ADOT remnant)"))%>%
dplyr::select(property_use, geometry)
vacant_city$property_use <- "vacant_property"
community <- city_prop%>%filter(
property_use %in% c("Libraries", "Mesa Arts Center","Museums","Child Crisis Center", "Sequoia Charter School", "Senior Center"))%>%
dplyr::select(property_use, geometry)
community$property_use <- "comm_development"
graffiti_calls <- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/9spb-749m.json")),
coords = c("lon", "lat"), crs = 4326, agr = "constant")%>%st_transform(st_crs(maricopa_crs))
graffiti_calls$nearest_address <- removeNumbers(graffiti_calls$nearest_address)
graffiti_calls$date_reported <- as.Date(graffiti_calls$date_reported)
graffiti_calls$property_use <- "graffiti"
graffiti_calls <- graffiti_calls %>% distinct(date_reported, nearest_address, .keep_all = TRUE) %>%
filter(between(date_reported, as.Date('2018-01-01'), as.Date('2021-01-01'))) %>%
dplyr::select(property_use, geometry)
mesa_cb_updated <-
rbind(safety, green, vacant_city, community, graffiti_calls) %>%
st_join(., mesa_cb_updated, join=st_within) %>%
st_drop_geometry() %>%
group_by(GEOID,property_use) %>%
summarize(count = n()) %>%
full_join(mesa_cb_updated) %>%
spread(property_use, count, fill=0) %>%
st_sf() %>%
dplyr::select(-`<NA>`) %>%
na.omit() %>%
ungroup()
### nearest neighbor function from Public Policy Analytics Textbook
nn_function <- function(measureFrom,measureTo,k) {
measureFrom_Matrix <-
as.matrix(measureFrom)
measureTo_Matrix <-
as.matrix(measureTo)
nn <-
get.knnx(measureTo, measureFrom, k)$nn.dist
output <-
as.data.frame(nn) %>%
rownames_to_column(var = "thisPoint") %>%
gather(points, point_distance, V1:ncol(.)) %>%
arrange(as.numeric(thisPoint)) %>%
group_by(thisPoint) %>%
summarize(pointDistance = mean(point_distance)) %>%
arrange(as.numeric(thisPoint)) %>%
dplyr::select(-thisPoint) %>%
pull()
return(output)
}
### finding distance (nearest neighbors) to point data
st_c <- st_coordinates
st_coid <- st_centroid
mesa_cb_updated <-
mesa_cb_updated %>%
mutate(
dist_public_safety =
nn_function(st_c(st_coid(mesa_cb_updated)), st_c(safety),3),
dist_greenspace =
nn_function(st_c(st_coid(mesa_cb_updated)), st_c(green),3),
dist_vacant_prop =
nn_function(st_c(st_coid(mesa_cb_updated)), st_c(vacant_city),3),
dist_comm_dev =
nn_function(st_c(st_coid(mesa_cb_updated)), st_c(community),3),
dist_graffiti =
nn_function(st_c(st_coid(mesa_cb_updated)), st_c(graffiti_calls),3))
#---------------------------------- Police Incidents Dataset ---------------------------------------#
incidents <- st_as_sf(na.omit(read.socrata("https://data.mesaaz.gov/resource/39rt-2rfj.json")),
coords = c("longitude", "latitude"), crs = 4326, agr = "constant")%>%
st_transform(st_crs(maricopa_crs))%>%
filter(report_year == '2018' | report_year == '2019' | report_year == '2020')
incidents <- incidents[mesa_boundary,]
notapp <- c("DEATH", "DUI", "FOUND", "COLLISION", "INSURANCE", "LICENSE", "LOST", "PARKING", "MEDICAL",
"SUSPENSION", "DOG", "ANIMAL", "SPEED", "IGNITION", "REGISTRATION", "FAIL", "MENTALLY")
incidents <- incidents[ ! grepl(paste(notapp, collapse = "|"), incidents$crime_type), ]
incidents <- incidents[c("crime_id","geometry")]
incidents_intersect <- incidents %>% st_join(., mesa_cb_updated, join=st_within) %>%
group_by(GEOID) %>% summarize(count = n()) %>% st_drop_geometry()
colnames(incidents_intersect)[2] <- "arrests_count"
mesa_cb_final <- left_join(mesa_cb_updated, incidents_intersect, by="GEOID")
mesa_cb_final$arrests_count[is.na(mesa_cb_final$arrests_count)] <- 0
#st_write(mesa_cb_final, "../mesa_complete_data4.geojson", append=FALSE)We obtained all our demographic and socioeconomic data from the American Community Survey (ACS) which is an annual data collection program run by the United States Census Bureau. ACS is the premier source for detailed demographic and housing information in the United States. We obtained the most recent 5-year estimates (2020) on the census block group level which is the smallest geographical unit for which the Census Bureau provides data. There are 345 census block groups in Mesa with a substantial population. The information we collected includes the total population, the white, Black, and Hispanic population, the median household income, the total unemployed and the total labor force, the number of homeowners, the number of renters, the number of vacant properties, and the total number of properties. For all our main variables excluding income, we calculated the statistic as a proportion to account for varying population sizes across the census block groups.
The majority of our data sets come from the The City of Mesa Data Portal maintained by the municipal government. Our primary dataset is the “Fire and Medical Opioid Overdose Incidents”, which contains all confirmed cases of opioid overdoses and their locations rounded to the 1/3 mile increments. The overdose was either confirmed by the patient or witness, an opioid found on scene, or a positive response to Narcan treatment (City of Mesa). We obtained the number of overdose incidents from 2018 to 2020 and aggregated them by census block group.
We also collected the “Zoning Districts” data from the portal. The dataset contains delineated geographic areas in the city and their land use assignments. We specifically extracted land parcels assigned as high-density residential, low-density residential, and commercial. Additionally, we utilized the “City Owned Property” dataset which lists the properties owned or sold by the city and their locations. From this dataset we obtained point data on the location of parks and cemeteries (which we labelled as greenspace), libraries, arts centers, museums, crisis centers, and senior centers (which we labelled as community development resources), fire department and police stations (which we labelled as public safety), and municipal vacant property.
To investigate the degree of plight in neighborhoods, we used the “Transportation Graffiti” dataset which includes graffiti reports in the city. We obtained reports from 2018 to 2020 and filtered them by the date and location in order to collect only distinct reports.
To establish a smoother relationship to factors of our point datasets across space, we calculated average nearest neighbor features. Adapting code from Ken Steif’s Public Policy Analytics we calculated the distance from the centroid of each census block group to the 3 nearest factor points (Steif, 2021).
Finally, to investigate crime rates, we utilized the “Police Incidents” dataset which includes incidents based on police reports and arrests and their locations rounded to the nearest hundred block. We excluded all non-crime-related incidents such as car collisions, medical emergencies, or animal-related issues. We also excluded crimes that were related to driving such as DUIs, or missing driver’s license, or speeding, because of the uncertainty about if the location of the incident is associated with the individual charged.
After feature engineering, our list of potential independent variables include population, White population (as percent), Hispanic population (as percent), Black Population (as percent), median household income, unemployed population (as percent), homeowner population (as percent), renter population (as percent), vacant homes (as percent), land zoned as high-density residential (as percent), land zoned as low-density residential (as percent), land zoned as commercial (as percent), number of crime-related police incidents, average distance to the 3 nearest: graffiti reports, community development centers, greenspace, public safety, and vacant property.
maricopa_crs <- 'EPSG:2223'
mesa_dataset<- st_read("../mesa_complete_data4.geojson")%>%st_transform(st_crs(maricopa_crs))
all_cols <- c("GEOID", "overdose_count", "pop", "arrests_count", "median_income",
"perc_white", "perc_black", "perc_hispanic",
"perc_unemployed", "perc_owners", "perc_renters", "perc_vacant",
"resid_high_per", "resid_low_per", "commercial_per",
"dist_public_safety", "dist_greenspace", "dist_vacant_prop", "dist_comm_dev", "dist_graffiti",
"geometry")
ind_vars <- c("pop", "arrests_count", "median_income",
"perc_white", "perc_black", "perc_hispanic",
"perc_unemployed", "perc_owners", "perc_renters", "perc_vacant",
"resid_high_per", "resid_low_per", "commercial_per",
"dist_public_safety", "dist_greenspace", "dist_vacant_prop", "dist_comm_dev", "dist_graffiti")The chart below shows the frequency distribution of opioid overdose for Mesa census blocks. We find that the distribution of overdoses is heavily right-skewed. The mean and median of overdose cases are 5.1 and 3 respectively. However, a few census blocks have a relatively large number of overdose cases compared to others. It may represent that opioid overdose in Mesa may be more prevalent in several areas compared to most places where there are only a few cases. As a result, our modelling of the overdose counts will have to account for the prevalence of census block groups with no cases.
#histogram of overdoses
ov_hist <- ggplot(data=mesa_dataset, aes(overdose_count)) +
geom_histogram()+ ggtitle("Frequency Distribution of Overdoses") +
labs(y = "Count of Census Block Groups", x = "Number of overdoses")+
theme_minimal()
ov_histThe following map reveals the spatial distribution of opioid overdose among Mesa census blocks from 2018 to 2020. By observation, the census blocks with a more significant number of overdose cases tend to cluster. For example, they tend to cluster in the west of Mesa and the centre part of Mesa. On the other hand, those with a relatively lower number of cases also show a general pattern of clustering towards the North region and the South part of Mesa. Therefore, it is worth studying if those with higher values tend to cluster statistically, and so do those with lower values. Global and Local Moran’s I will be performed to test this hypothesis statistically.
mesa_cb_chor <- mutate(mesa_dataset, overdose_cat = cut(overdose_count, breaks = c(-.000001, 1, 3, 6, 10, 14, 45)))
pal <- brewer.pal(6, "OrRd")
plot(mesa_cb_chor[, "overdose_count"],
breaks = c(-.000001, 1, 3, 6, 10, 14, 45),
pal = pal,
main = "Overdoses in Mesa (2018-2020) by Census Block Group")Global Moran’s I is performed to check if autocorrelation of opioid overdoses exists globally. We find that at teh census block group level, counts of overdose incidents display significant positive autocorrelation.
mesa_nb <- poly2nb(mesa_dataset, queen=TRUE)
mesa_weights <- nb2listw(mesa_nb, style="W", zero.policy=TRUE)
#global autocorrelation results
moran.test(mesa_dataset$overdose_count, mesa_weights)Next, we investigate local autocorrelation using Local Moran’s I. The Moran scatterplot below shows the overdose count of each census block is compared to the neighbour’s weighted counts. The scatterplot reveals that majority of autocorrelation is from high overdose values neighboring high values (top right). However, there are also several instances of low values neighboring low values (bottom left).
# Moran Scatterplot
moran.plot(mesa_dataset$overdose_count, mesa_weights,
xlab="Overdose Counts", ylab="Spatailly lagged Overdose Counts")We map the various clusters to examine the spatial distribution of the autocorrelation. The result shows that in most of the city the autocorrelation is not significant. However, there is significant high-high autocorrelation in the west and center of the city. There is one case of low-low autocorrelation in the northeast of the city.
# Morans Clusters
moranCluster <- function(shape, W, var, alpha=0.05)
{
# Code adapted from https://rpubs.com/Hailstone/346625
Ii <- localmoran(shape[[var]], W)
shape$Ii <- Ii[,"Ii"]
shape$Iip <- Ii[,"Pr(z != E(Ii))"]
shape$sig <- shape$Iip<alpha
# Scale the data to obtain low and high values
shape$scaled <- scale(shape[[var]]) # high low values at location i
shape$lag_scaled <- lag.listw(mesa_weights, shape$scaled) # high low values at neighbours j
shape$lag_cat <- factor(ifelse(shape$scaled>0 & shape$lag_scaled>0, "HH",
ifelse(shape$scaled>0 & shape$lag_scaled<0, "HL",
ifelse(shape$scaled<0 & shape$lag_scaled<0, "LL",
ifelse(shape$scaled<0 & shape$lag_scaled<0, "LH", "Equivalent")))))
shape$sig_cluster <- as.character(shape$lag_cat)
shape$sig_cluster[!shape$sig] <- "Non-sig"
shape$sig_cluster <- as.factor(shape$sig_cluster)
results <- data.frame(Ii=shape$Ii, pvalue=shape$Iip, type=shape$lag_cat, sig=shape$sig_cluster)
return(list(results=results))
}
clusters <- moranCluster(mesa_dataset, W=mesa_weights, var="overdose_count")$results
mesa_dataset$Ii_cluster <- clusters$sig
tm_shape(mesa_dataset) + tm_polygons(col="Ii_cluster", palette = "Set3", title = "Clusters") + tm_layout(title= 'Spatial Autocorrelation', fontface = 2)We explored the spatial distribution of our potentnial independent variables by mapping them as shown below.
Our results show that there are several trends associated with the western part of the city. In this part of the city, we find the most extreme number of arrests, the largest proportions of Hispanic residents and lowest proportions of White residents, as well as the lowest proportions of homeowners. In general, we find that most of the city is zoned for low-density residential housing while most of the commercial and high-density residential zoning is in the northwest or across the central part of the city closer to the downtown. Most of the census block groups with large amounts of areas zoned as vacant land are in the eastern part of Mesa. On the other hand, most of the city-owned vacant properties (no inhabitants) are closer to the west of the city. We find that the average distance to greenspace, public safety facilities and graffiti are generally similar across the cities except in the northeast and southeast of the city which are also the areas of the highest incomes. Distance to community development resources increasingly enlarges west of the city. However, this is expected given that the downtown is near the east. Finally, we find an extreme outlier in one census block group in the center of the city with close to a 100% unemployment rate which we will remove from our data set.
ind_vars_table <- mesa_dataset[ind_vars]
vars_net.long <-
gather(ind_vars_table, Variable, value, -geometry)
mapTheme<- function(base_size = 15, title_size = 20) {
theme(
text = element_text( color = "black"),
plot.title = element_text(size = title_size, colour = "black"),
plot.caption=element_text(hjust=0),
axis.ticks = element_blank(),
panel.background = element_blank(),axis.title = element_blank(),
axis.text = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
strip.text.x = element_text(size = 10),
legend.key = element_rect(fill=NA))
}
vars <- unique(vars_net.long$Variable)
mapList <- list()
#map risk factors
for(i in vars){
mapList[[i]] <-
ggplot() +
geom_sf(data = filter(vars_net.long, Variable == i), aes(fill=value), colour=NA) +
scale_fill_viridis(option = "A", name="") +
labs(title=i) +
mapTheme() +
theme(plot.title = element_text(size=10))}
do.call(grid.arrange,c(mapList, ncol = 5, top = "Spatial Distribution of Independent Variables"))We also explored the frequency distributions of the potential independent variables. Most of the distributions are heavily right-skewed or left skewed. Several of variables have several observations in the zero bucket because many census block groups do not have areas that zoned for activities other than low-density housing.
mesa_nogeo <- mesa_dataset
st_geometry(mesa_nogeo) <- NULL
mesa_nogeo <- mesa_nogeo[ind_vars]
data_long <- mesa_nogeo %>% pivot_longer(colnames(mesa_nogeo))%>%as.data.frame()
ggplot(data_long, aes(x = value)) + geom_histogram() + theme(axis.text.x = element_text(angle = 45)) + theme_void() + facet_wrap(~ name, scales = "free")We examine the correlations between the independent variables to check for possible multicollinearity before running our regressions. As expected, some variables are highly correlated such as the proportion of Hispanic residents and White residents or the proportion of homeowners and renters. We find that the proportion of White residents and the proportion of renters are highly negatively correlated with each other but also correlated with several other potential risk factors, so we decide to drop them from our list of independent variables. Several of our distance-related risk factors are also highly positively correlated like the distance to public safety, greenspace and community development resources. This situation is likely because these are all municipal facilities so they are all located closer to the city center.
cormat <- round(cor(mesa_nogeo),2)
ggcorrplot(cormat)Finally, we explore the relationship between the potential risk factors and the number of overdoses in each census block group. We find that there are some variables with very low linear correlations to the number of overdoses such as the population, the percent of land zoned as low-density residential, the pecent of land zoned as vacant, and the unemployment rate. On the other hand, variables such as the number of arrests and the percent of homeowners are much more correlated with overdoses. The scatterplots show that several of relationships do not appear to follow a linear pattern, but this phenomenon might be influenced by the strong spatial relationships we previously observed.
vars_plus <- append(ind_vars,"overdose_count")
mesa_corr <- mesa_dataset
st_geometry(mesa_corr) <- NULL
mesa_corr <- mesa_corr[vars_plus]
mesa_corr <- mesa_corr %>% gather(Variable, Value, -overdose_count) %>%
mutate(Value = as.numeric(Value))
correlation.cor <-
mesa_corr %>%
group_by(Variable) %>%
summarize(correlation = cor(Value, overdose_count, use = "complete.obs"))
ggplot(mesa_corr, aes(Value, overdose_count)) +
geom_point(size = 1) +
geom_text(data = correlation.cor, aes(label = paste("r =", round(correlation, 2))),
x=-Inf, y=Inf, vjust = 1.5, hjust = -.1) +
geom_smooth(method = "lm", se = FALSE, colour = "black") + theme_void() +
facet_wrap(~Variable, ncol = 6, scales = "free") +
labs(title = "Correlations between Number of Overdoses and Risk Variables \n
") +
theme(strip.text.x = element_text(size = 10))After taking into account multicollinearity and the relationship with the overdose count, the variables chosen for the model are: median household income, Black population (as percent), Hispanic population (as percent), number of police incidents, homeowner population (as percent), land zoned as commercial (as percent), land zoned as high-density residential (as percent), average distance to vacant property, average distance to graffiti reports, average distance to greenspace and average distance to public safety.
maricopa_crs <- 'EPSG:2223'
mesa_dataset<- st_read("../mesa_complete_data4.geojson")%>%st_transform(st_crs(maricopa_crs))
all_vars <- c("overdose_count", "arrests_count", "median_income", "perc_black", "perc_hispanic","perc_owners",
"resid_high_per","commercial_per", "dist_greenspace", "dist_vacant_prop", "dist_graffiti",
"dist_public_safety")
ind_vars <- c("arrests_count", "median_income", "perc_black", "perc_hispanic", "perc_owners",
"resid_high_per","commercial_per", "dist_greenspace", "dist_vacant_prop", "dist_graffiti",
"dist_public_safety")
mesa_all <- mesa_dataset[all_vars]By Ayina Anyachebelu
We first build our baseline models using Ordinary Least Squares and Poisson Regression. The formula for the linear regression is as follows:
\[ y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} +... + \beta_nx_{in} + \epsilon_i \] where \(y_i\) is the \(i\)th observation of number of overdoses, \(x_i\) corresponds with each of our n potential risk factors, \(\beta_0\) represents the intercept term, \(\beta_n\) is the nth estimated parameter, and \(\epsilon\) is the error term.
Considering our dependent variable consists of non-negative overdose counts, we also use a Poisson regression with the following equation:
\[ ln(y_i) = \beta_1x_{i1} + \beta_2x_{i2} +... + \beta_nx_{in} + \epsilon_i \] where \(x_1\) = 1 so \(\beta_1\) represents our constant term and for any values of \(x_1\)…\(x_n\) we can use the regression to predict the odds of an overdose happening.
Below are the results of the OLS Model.
reg <- lm(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic + perc_owners +
resid_high_per + commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
dist_public_safety,
data = mesa_all)
summary(reg)##
## Call:
## lm(formula = overdose_count ~ arrests_count + median_income +
## perc_black + perc_hispanic + perc_owners + resid_high_per +
## commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
## dist_public_safety, data = mesa_all)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.1507 -2.8612 -0.8112 1.5282 30.0266
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.252008260 1.853815332 2.294 0.022433 *
## arrests_count 0.006032666 0.001254947 4.807 0.00000232 ***
## median_income 0.000005764 0.000014174 0.407 0.684506
## perc_black 0.070938516 0.071251288 0.996 0.320162
## perc_hispanic 0.053501444 0.018188173 2.942 0.003494 **
## perc_owners -0.003865112 0.018783864 -0.206 0.837098
## resid_high_per 0.058222957 0.016440101 3.542 0.000455 ***
## commercial_per 0.015697167 0.039800303 0.394 0.693540
## dist_greenspace 0.000060832 0.000141520 0.430 0.667582
## dist_vacant_prop -0.000052372 0.000044145 -1.186 0.236319
## dist_graffiti 0.000315769 0.000280584 1.125 0.261229
## dist_public_safety -0.000381148 0.000140613 -2.711 0.007064 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.585 on 333 degrees of freedom
## Multiple R-squared: 0.3365, Adjusted R-squared: 0.3146
## F-statistic: 15.35 on 11 and 333 DF, p-value: < 0.00000000000000022
The OLS model has an Adjusted R-squared of 0.31, suggesting our factors only explain about 30% of the variation in number of overdoses. At the 5% level, independent variables with statistically significant coefficients were the number of arrests, the percent of Hispanic residents, the percent of the area zoned as high-density residential and the distance to public safety resources. All variables except the distance to public safety resources are positively correlated with number of overdoses.
Below are the results of the Poisson regression. Using this model, in addition to the significant variables in the OLS regression, the percentage of Hispanic residents and percentage of Black residents and the distance to public greenspace were also significant at the 5% level. Particularly interesting results are that every 1-unit increase in the percentage of land zoned as high-density residential corresponds to a ~ 0.98% increase in overdose cases. Similarly every 1-unit increase in the percentage of black residents and hispanic residents corresponded to overdose case increases of 1.17% and 0.70% respectively.
Unlike our hypotheses the farther away from public greenspace and public safety resources, the lower the overdose counts. While there are no grounds for causation, the reason for this phenomenon might be the fact that many of these public facilities are nearer to the city center and farther from suburban outskirts, so more overdoses may be happening in these areas.
preg <- glm(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic + perc_owners +
resid_high_per + commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
dist_public_safety, family = "poisson", data = mesa_all)
summary(preg)##
## Call:
## glm(formula = overdose_count ~ arrests_count + median_income +
## perc_black + perc_hispanic + perc_owners + resid_high_per +
## commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
## dist_public_safety, family = "poisson", data = mesa_all)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.9643 -1.5895 -0.4465 0.7398 6.8914
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.0595832100 0.1566417221 13.148 < 0.0000000000000002 ***
## arrests_count 0.0006844624 0.0000686423 9.971 < 0.0000000000000002 ***
## median_income -0.0000008436 0.0000013771 -0.613 0.5401
## perc_black 0.0116940923 0.0048565931 2.408 0.0160 *
## perc_hispanic 0.0070092185 0.0013777334 5.087 0.000000363 ***
## perc_owners 0.0009652980 0.0014383300 0.671 0.5021
## resid_high_per 0.0098350225 0.0011898858 8.266 < 0.0000000000000002 ***
## commercial_per 0.0045587548 0.0026782416 1.702 0.0887 .
## dist_greenspace -0.0000579105 0.0000143220 -4.043 0.000052664 ***
## dist_vacant_prop 0.0000025045 0.0000043933 0.570 0.5686
## dist_graffiti -0.0000081344 0.0000421016 -0.193 0.8468
## dist_public_safety -0.0001149590 0.0000129847 -8.853 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2322.2 on 344 degrees of freedom
## Residual deviance: 1269.0 on 333 degrees of freedom
## AIC: 2206.1
##
## Number of Fisher Scoring iterations: 5
The Moran’s correlation test was conducted to check for residual spatial dependence of both models. As seen below, the test resulted in a Moran’s I of 0.014 and 0.0068 respectively with corresponding Moran I statistics that were not significant on the 5% level. As a result, we cannot retain the null hypothesis that there is no spatial clustering of the residuals.
lm.morantest(reg, mesa_weights, zero.policy=TRUE)##
## Global Moran I for regression residuals
##
## data:
## model: lm(formula = overdose_count ~ arrests_count + median_income +
## perc_black + perc_hispanic + perc_owners + resid_high_per +
## commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
## dist_public_safety, data = mesa_all)
## weights: mesa_weights
##
## Moran I statistic standard deviate = 0.98346, p-value = 0.1627
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.0143625012 -0.0155922393 0.0009277194
lm.morantest(preg, mesa_weights, zero.policy=TRUE)##
## Global Moran I for regression residuals
##
## data:
## model: glm(formula = overdose_count ~ arrests_count + median_income +
## perc_black + perc_hispanic + perc_owners + resid_high_per +
## commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
## dist_public_safety, family = "poisson", data = mesa_all)
## weights: mesa_weights
##
## Moran I statistic standard deviate = 0.41218, p-value = 0.3401
## alternative hypothesis: greater
## sample estimates:
## Observed Moran I Expectation Variance
## 0.006842371 -0.006272096 0.001012336
Although we cannot reject the null hypothesis based of our Moran correlation test results, an examination of the residuals of the OLS model (below), shows that their distribution appear to be non-random with large positive residuals in the west and center of the city. The poisson model has residual values of smaller magnitude which are more evenly spread out spatially. However, there still seems to be some clustering of positive residuals in the west and center of the city. For this reason, we proceed with spatial regression models.
mesa_all$res_m1 <- residuals(reg)
legend_title = expression("OLS model residuals")
map_mesa_res = tm_shape(mesa_all) +
tm_fill(col = "res_m1", title = legend_title, palette = magma(256), style = "cont") + tm_layout(bg.color = "white")
mesa_all$res_m2 <- residuals(preg)
legend_title = expression("Poisson model residuals")
map_mesa_res2 = tm_shape(mesa_all) +
tm_fill(col = "res_m2", title = legend_title, palette = magma(256), style = "cont")+ tm_layout(bg.color = "white")
tmap_arrange(map_mesa_res, map_mesa_res2)By Humza Hussain
The residual values from the OLS regression were observed to see whether any significant autocorrelation existed in these values. In the map above, there do seem to be significant clusters of positive residuals to the west of the city. While this histogram of residuals shows that an approximately normally distribution on the most part (figure ?), outliers to the positive side of the spectrum lead to deviation at the right tail. This is further evidenced by deviation at the right tail of the QQ-plot (figure ?).
mesa_all$lm_res <- residuals(reg)
ggplot(data=mesa_all, aes(lm_res)) + geom_histogram() + ggtitle("Frequency Distribution of Residuals from OLS Regression Model") +
labs(y = "Count of Census Block Groups", x = "Residual Value")+
theme_minimal()Figure ?: Histogram of Linear Model Residuals
ggplot(data=mesa_all, aes(sample=lm_res)) + geom_qq() + geom_qq_line() + labs(y = "Sample", x = "Theoretical")Figure ?: QQ-plot of Linear Model Residuals
The results of the OLS model point to key assumptions of linear regression being violated, most notably the assumption that residuals are independent and not spatially autocorrelated. We attempt to integrate a spatial lag model as one of the spatial regression models to account for spatial autocorrelation in our data.
The spatial lag model assumes that autocorrelation is present in the dependent variable, that being opioid overdose cases. It assumes that the dependant variable is correlated with the same variable at nearby locations, above and beyond other covariates (Ward and Gleditsch 2008).
The spatial lag model is defined as follows:
\[ Y = \rho Wy + X\beta + \epsilon \] Where \(y\) is a vector of observations, \(\rho\) is an autocorrelation parameter, \(W\) is an \(n*n\) spatial weight matrix and \(\epsilon\) is an \(n*1\) vector of independent and identically distributed errors (iid). The spatial lag model follows the iid assumption, that being that the errors are normally distributed with a mean of 0 and a variance of \(\sigma^2\).
Before fitting a spatial lag model, it was assessed whether any of the independent variables required transformation prior to fitting a model. A Generalised Additive Model (GAM) was used to fit a smooth non-parametric function to the opioid overdoes variable using predictor variables (Zuur et al. 2007), and these results were tested against the distribution of independent variables to assess which transformations were appropriate. A spatial lag model was run with both predictor variables left untransformed and one with appropriate transformations to assess which model had the best fit.
Figure ?: Summary of GAM results
Above are the results from the GAM test on predictor variables (figure ?). Only 3 variables showed statistical significance to the 0.05% significance level, that being number of police incidents, percentage of land zoned as high-density residential and average distance to public safety. In the context of a GAM, coefficients of 1 for the effective degrees of freedom coefficient (edf) tend to correspond to smooth functions that are more flexible, a good indicator that the relationship between this predictor variable and the dependant is non-linear. With all three statistically significant variables having edf coefficients above 1, their distributions were analysed further to assess if transformations were appropriate.
While distance to public safety replicated an approximately normal distribution, both police incidents and percentage of land zoned as high-density residentials showed distributions heavily skewed to the left. Therefore, a logarithmic transformation was only applied to police incidents and percentage of land zoned as high-density residentials, with the results from this model summarised below (figure ?).
Figure ?: Spatial Lag Model with Transformed Variables
A model was also run with all predictor variables left untransformed, with the results from this model shown below (figure ?).
Figure ?: Spatial Lag Model with Transformed Variables
A model was also run with all predictor variables left untransformed, with the results from this model shown below (figure ?).
summary(mesa.lag)Figure ?: Spatial Lag Model with Variables left in their natural form
From a first glance, we can see that the model with non-transformed predictor variables has both a lower AIC coefficient (2181.2 vs 2204.4) and a higher log likelihood (-1.076.6 vs -1.088.2). These both indicate that the model with untransformed predictor variables fits our data much better, and further analysis focused on this model.
From the individual variables, only four variables showed statistical significance at the 5% significance level, these being police incidents, Hispanic population percentage, percentage of land zoned as high-density residential and average distance to public safety. While all of their respective coefficients were substantially low, they were all positively associated with opioid overdose cases. For example, the spatial lag model indicates that a one increase in police incidents corresponded to a 0.006% increase in opioid overdoses, and a 1 increase in percentage of Hispanic population corresponded to a 0.05% increase in overdoses. While these observations reaffirm our hypotheses of higher opioid overdoes being found in areas with increased arrests (Choi et al. 2022), higher Latina communities (Lippold et al. 2019) and increased distances from public safety (Li et al. 2019), it is important to assess the suitability of the spatial lag model as a whole before making any final conclusions.
The p-value from the rho autocorrelation parameter of 0.448 is a strong indicator that the additionality of \(\rho\) does not significantly improve the model fit. Furthermore, the p-value of the Wald test of 0.442 indicates that there is insufficient evidence to reject the null hypothesis that the parameters of the spatial lag variable are equal to zero.
The residuals from this spatial lag model further prove that the spatial lag model did a poor job in accounting for the observed spatial autocorrelation across Mesa, given their almost identical nature to residuals from the linear OLS regression model. These can be seen bellow in the map (figure ?), histogram (figure ?) and QQ-plot (figure ?) of the spatial lag model residuals.
mesa_all$log_res <- residuals(mesa.lag)
legend_title <- expression("Spatial Lag Model Residuals")
tm_shape(mesa_all) + tm_fill(col = "log_res", title = legend_title, palette = magma(256), style = "cont") + tm_layout(bg.color = "white")Figure ?: Spatial Distribution of Spatial Lag Model Residuals
ggplot(data=mesa_all, aes(log_res)) + geom_histogram() + ggtitle("Frequency Distribution of Residuals from Spatial Lag Model") +
labs(y = "Count of Census Block Groups", x = "Residual Value")+
theme_minimal()Figure ?: Histogram of Spatial Lag Model Residuals
ggplot(data=mesa_all, aes(sample=log_res)) + geom_qq() + geom_qq_line() + labs(y = "Sample", x = "Theoretical")Figure ?: QQ-plot of Spatial Lag Model Residuals
All in all, these observations indicate that there is insufficient evidence to reject the null hypothesis. Therefore, we cannot say with confidence that the dependant variable (opioid overdoses) is correlated with the same variable at nearby locations. Different spatial regression models may better explain the spatial autocorrelation observed from the exploratory spatial data analysis.
By ____
By ____
By _____
By Ayina Anyachebelu
Given that a Poisson regression better represents the relationship between the risk factors and the overdose count, we attempt to integrate this model and the spatial variation in overdose cases using a geographically weighted poisson regression which captures local relationships.
The geographically weighted poisson regression follows the form:
\[ ln(y_i) = \beta_0u_{i} + \beta_1(u_i)x_{i1} + \beta_2(u_i)x_{i2} +... + \beta_n(u_i)x_{in} + \epsilon \] where where \(y_i\) is the \(i\)th observation of number of overdoses and the \(\beta\) coefficients are functions of the location \(u_i\) designating the coordinates of the \(i\)th census block group, allowing the estimated \(\beta\) parameter to differ between census block groups (Poliart et. al, 2021).
For each census block group, the parameters were estimated by
\[ \overset{\wedge}{\beta}_i = (X^TW(u_{xi},u_{yi})X)^{-1}X^TW(u_{xi},u_{yi})Y \] where \(W(u_{xi},u_{yi})\) denotes a spatial weight matrix containing weights calculated based on the distance between the centroid of the \(i\)th census block group with coordinates (u_{xi},u_{yi}) and its spatial neighbors j (Fortheringham et al, 2003).
The spatial weight \(w_{ij}\) allows for census block groups closer to the \(i\)th census block group to have a larger influence on the parameter estimation for \(\beta_i\). The weight is based on a kernel function. We experiment with two kernel functions the Gaussian and Exponential, which are both continuous functions of the distance between observation points.
The corresponding weight matrices are calculated as follows
Gaussian: \(w_{i j}=\exp \left(-\frac{1}{2}\left(\frac{d_{i j}}{b}\right)^2\right)\) and Exponential: \(w_{i j}=\exp \left(-\frac{\left|d_{i j}\right|}{b}\right)\)
where \(h\) is the kernel bandwidth and \(d_{ij}\) is the distance between points i and j (Gollini et al., 2013).
For each kernel function, we also experiment with both a fixed kernel (which use a fixed bandwidth) and adaptive kernel (which uses a varying bandwidth based on the number of nearest neighbors from a given regression point.) Given we have census block groups of different sizes, the adaptive kernels could be useful because these kernels change in size based on the density of data. Using the fixed kernels could result in calibration on few observation points for local regression in larger units and vice versa resulting in larger standard errors (Gollini et al., 2013).
maricopa_crs <- 'EPSG:2223'
mesa_dataset<- st_read("../mesa_complete_data4.geojson")%>%st_transform(st_crs(maricopa_crs))
mesa_dataset<- st_read("../mesa_complete_data4.geojson")%>%st_transform(st_crs(maricopa_crs))
all_vars <- c("overdose_count", "arrests_count", "median_income", "perc_black",
"perc_hispanic","perc_owners","resid_high_per","commercial_per", "dist_greenspace",
"dist_vacant_prop", "dist_graffiti","dist_public_safety")
mesa_all <- mesa_dataset[all_vars]
st_c <- st_coordinates
st_coid <- st_centroid
mesa.c <- st_c(st_coid(mesa_all))
DM<-gw.dist(dp.locat=coordinates(mesa.c))
mesa_all$long <- mesa.c[,1]
mesa_all$lat <- mesa.c[,2]We select the optimal bandwidth for each kernel by minimizing the Akaike Information Criterion to account for both prediction accuracy as well as lower complexity of our model (Gollini et al., 2015). For the fixed Gaussian and Exponential kernels the optimal bandwidths were 7614.3182847 and 6843.6505642 meters respectively. For the adaptive Gaussian and Exponential kernels, the optimal bandwidths was 18 nearest neighbors.
# FIT THE 4 GWPRs
#Gaussian non-adaptive
bgwr.gnPr <- ggwr.basic(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic +
perc_owners + resid_high_per + commercial_per + dist_greenspace +
dist_vacant_prop + dist_graffiti + dist_public_safety,
data = mesa_all,
family = "poisson",
bw = bw.gwrG,
kernel = "gaussian",
adaptive = FALSE,
dMat = DM)
#gaussian adaptive
bgwr.gPr <- ggwr.basic(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic +
perc_owners + resid_high_per + commercial_per + dist_greenspace +
dist_vacant_prop + dist_graffiti + dist_public_safety,
data = mesa_all,
family = "poisson",
bw = bw.gwrGA,
kernel = "gaussian",
adaptive = TRUE,
dMat = DM)
# exponential non-adaptive
bgwr.nPr <- ggwr.basic(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic +
perc_owners + resid_high_per + commercial_per + dist_greenspace +
dist_vacant_prop + dist_graffiti + dist_public_safety,
data = mesa_all,
family = "poisson",
bw = bw.gwrE,
kernel = "exponential",
adaptive = TRUE,
dMat = DM)
#exponential adaptive
bgwr.Pr <- ggwr.basic(overdose_count ~ arrests_count + median_income + perc_black + perc_hispanic +
perc_owners + resid_high_per + commercial_per + dist_greenspace +
dist_vacant_prop + dist_graffiti + dist_public_safety,
data = mesa_all,
family = "poisson",
bw = bw.gwrEA,
kernel = "exponential",
adaptive = TRUE,
dMat = DM)We fit the 4 geographically weighted Poisson models using their corresponding optimal bandwidths. We find that the model using the exponential adaptive kernel has the lowest AIC and residual sum of squares and the highest Pseudo R-squared values. Below are the maps of the standard deviation of the local residual values for the models. As mentioned, exponential adaptive kernel appears to have the best fit for our data, so we choose these parameters for our final model.
gwr_out <- as.data.frame(bgwr.gnPr$SDF)
mesa_all$gauss_na_localRes<-(gwr_out$residual)/sd(gwr_out$residual)
gwr_out <- as.data.frame(bgwr.gPr$SDF)
mesa_all$gauss_localRes<-(gwr_out$residual)/sd(gwr_out$residual)
gwr_out <- as.data.frame(bgwr.nPr$SDF)
mesa_all$exp_na_localRes<-(gwr_out$residual)/sd(gwr_out$residual)
gwr_out <- as.data.frame(bgwr.Pr$SDF)
mesa_all$exp_localRes<-(gwr_out$residual)/sd(gwr_out$residual)
my.palette <- brewer.pal(n = 8, name = "RdYlBu")
gauss_na_localRes <-spplot(mesa_all,"gauss_na_localRes", main = "Gauss. Non-Adaptive",
col="transparent", col.regions = my.palette, cuts = 7)
gauss_localRes<-spplot(mesa_all,"gauss_localRes", main = "Gauss. Adaptive",
col="transparent",col.regions = my.palette, cuts = 7)
exp_na_localRes<-spplot(mesa_all,"gauss_localRes", main = "Expon. Non-Adaptive",
col="transparent", col.regions = my.palette, cuts = 7)
exp_localRes<-spplot(mesa_all,"exp_localRes", main = "Expon. Adaptive",
col="transparent", col.regions = my.palette, cuts = 7)
grid.arrange(gauss_na_localRes, gauss_localRes, exp_na_localRes, exp_localRes,
ncol= 3, heights = c(20,20), top = textGrob("Stdev of Residuals", gp=gpar(fontsize=20)))Below are the results of the geographically weighted poisson regression using an adaptive exponential kernel. The results show that the model has a pseudo R-squared value of 0.69, meaning that it is able to explain almost 70% of the variation in overdose cases. This is a substantial improvement from the 45% R-squared value of the baseline poisson model.
The results of the GWPR provide us with better insight into how the risk factors vary across the different census block groups. For example, the baseline poisson regression indicated that 1 additional arrest corresponded with a 0.068% increase in overdoses. However, with the GWPR we see that the increase can be over 3 times higher, corresponding to a 0.23% increase in overdoses. The broadness of the ranges in local estimates are even more pronounced for the Hispanic population as well as land zoned as high-density residential. The initial estimate from the baseline model indicated that one percentage point increase in the proportion of Hispanic residents was associated with a 0.7% increase in number of overdoses, but the GWPR reveals that it can be associated with an overdose increase of as high as 2.8% in some city areas. Meanwhile, estimates for increases in incidents corresponding to high-density residential zoning range from as low as 0.1% to as high as ~ 2.5%. The results also reveal that a majority of our distance-related risk factors, especially the distance to public greenspace and distance to public safety are almost always negatively correlated with overdose cases, regardless of the part of the city which disaffirms our original hypothesis that in blocks closer to these facilities there will be less incidents.
bgwr.Pr## ***********************************************************************
## * Package GWmodel *
## ***********************************************************************
## Program starts at: 2023-01-13 11:46:09
## Call:
## ggwr.basic(formula = overdose_count ~ arrests_count + median_income +
## perc_black + perc_hispanic + perc_owners + resid_high_per +
## commercial_per + dist_greenspace + dist_vacant_prop + dist_graffiti +
## dist_public_safety, data = mesa_all, bw = bw.gwrEA, family = "poisson",
## kernel = "exponential", adaptive = TRUE, dMat = DM)
##
## Dependent (y) variable: overdose_count
## Independent variables: arrests_count median_income perc_black perc_hispanic perc_owners resid_high_per commercial_per dist_greenspace dist_vacant_prop dist_graffiti dist_public_safety
## Number of data points: 345
## Used family: poisson
## ***********************************************************************
## * Results of Generalized linear Regression *
## ***********************************************************************
##
## Call:
## NULL
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0000 -0.6807 -0.2278 0.4445 15.8330
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Intercept 2.0595832100 0.1566417221 13.148 < 0.0000000000000002 ***
## arrests_count 0.0006844624 0.0000686423 9.971 < 0.0000000000000002 ***
## median_income -0.0000008436 0.0000013771 -0.613 0.5401
## perc_black 0.0116940923 0.0048565931 2.408 0.0160 *
## perc_hispanic 0.0070092185 0.0013777334 5.087 0.000000363 ***
## perc_owners 0.0009652980 0.0014383300 0.671 0.5021
## resid_high_per 0.0098350225 0.0011898858 8.266 < 0.0000000000000002 ***
## commercial_per 0.0045587548 0.0026782416 1.702 0.0887 .
## dist_greenspace -0.0000579105 0.0000143220 -4.043 0.000052664 ***
## dist_vacant_prop 0.0000025045 0.0000043933 0.570 0.5686
## dist_graffiti -0.0000081344 0.0000421016 -0.193 0.8468
## dist_public_safety -0.0001149590 0.0000129847 -8.853 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2322.2 on 344 degrees of freedom
## Residual deviance: 1269.0 on 333 degrees of freedom
## AIC: 1293
##
## Number of Fisher Scoring iterations: 5
##
##
## AICc: 1293.976
## Pseudo R-square value: 0.4535294
## ***********************************************************************
## * Results of Geographically Weighted Regression *
## ***********************************************************************
##
## *********************Model calibration information*********************
## Kernel function: exponential
## Adaptive bandwidth: 18 (number of nearest neighbours)
## Regression points: the same locations as observations are used.
## Distance metric: A distance matrix is specified for this model calibration.
##
## ************Summary of Generalized GWR coefficient estimates:**********
## Min. 1st Qu. Median
## Intercept -0.22607694166 1.20067276202 1.73960798969
## arrests_count 0.00042833565 0.00074185490 0.00107242046
## median_income -0.00001113456 -0.00000422743 0.00000013829
## perc_black -0.02218714651 -0.00127668722 0.00569746260
## perc_hispanic -0.00317040486 0.00372447667 0.00640199380
## perc_owners -0.00384157393 0.00040092182 0.00289802946
## resid_high_per 0.00105202619 0.00577706051 0.01019328060
## commercial_per -0.01377220074 0.00243908237 0.01007332235
## dist_greenspace -0.00023266937 -0.00014462074 -0.00004561559
## dist_vacant_prop -0.00004565609 -0.00002088898 -0.00000954670
## dist_graffiti -0.00013024107 0.00001260078 0.00004873067
## dist_public_safety -0.00026929248 -0.00014928048 -0.00010660187
## 3rd Qu. Max.
## Intercept 2.47403573774 3.6311
## arrests_count 0.00152285380 0.0023
## median_income 0.00000312500 0.0000
## perc_black 0.01224007904 0.0332
## perc_hispanic 0.01012181509 0.0283
## perc_owners 0.00740199548 0.0182
## resid_high_per 0.01461808174 0.0242
## commercial_per 0.01523986914 0.0230
## dist_greenspace -0.00001225402 0.0001
## dist_vacant_prop 0.00000815925 0.0001
## dist_graffiti 0.00009227430 0.0003
## dist_public_safety -0.00006211557 0.0000
## ************************Diagnostic information*************************
## Number of data points: 345
## GW Deviance: 719.0314
## AIC : 889.3877
## AICc : 946.1102
## Pseudo R-square value: 0.6903717
##
## ***********************************************************************
## Program stops at: 2023-01-13 11:46:11
Given that the magnitude of the coefficients is very small, it is useful to examine the test statistics to investigate how the statistical significance of these values change spatially. Below are maps of the pseudo t-values for each of the risk factors.
From the results, we observe that the number of arrests is almost always statistically significant in all regions, but it is especially significant in the northeast, southeast and western-central areas of the city. Additionally, the hispanic population is increasingly significant closer to the center of the city while the black population is significant in the far west of the city. Distance to greenspace has a significant negative relationship with overdose incidents in the west of the city while distance to public safety has a significnat negative relationship with overdose cases in the east of the city. However, it is important to note when interpreting these results that thes are municipal-run facilities, so for example, our model is only taking into consideration city-owned parks are not accounting for private greenspace which might be available in other areas. Unsurprisingly, land zoned being zoned commercial is only significant in the east of the city which is reasonable as that is farther from the western downtown area where most of the neighborhoods would be zoned as commercial. On the other hand, distance to vacant property is positively statistically significant in the center of the city but negatively statistically significant in the in the southeast. Similarly, median income is positively significant near the downtown area and central-east, but there are enclaves of blocks where the relationship is negative. Finally, homeownership is mainly statistically significant in the northwest of the city.
mesa_all$t_arrests_count<-bgwr.Pr$SDF$arrests_count_TV
mesa_all$t_perc_black<-bgwr.Pr$SDF$perc_black_TV
mesa_all$t_perc_hispanic<-bgwr.Pr$SDF$perc_hispanic_TV
mesa_all$t_dist_public_safety<-bgwr.Pr$SDF$dist_public_safety_TV
mesa_all$t_resid_high_per<-bgwr.Pr$SDF$resid_high_per_TV
mesa_all$t_dist_greenspace<-bgwr.Pr$SDF$dist_greenspace_TV
mesa_all$t_median_income<-bgwr.Pr$SDF$median_income_TV
mesa_all$t_perc_owners<-bgwr.Pr$SDF$perc_owners_TV
mesa_all$t_commercial_per<-bgwr.Pr$SDF$commercial_per_TV
mesa_all$t_dist_vacant_prop<-bgwr.Pr$SDF$dist_vacant_prop_TV
mesa_all$t_dist_graffiti<-bgwr.Pr$SDF$dist_graffiti_TV
mesa_all$t_Intercept<-bgwr.Pr$SDF$Intercept_TV
my.palette <- brewer.pal(n = 7, name = "YlOrRd")
t_arrest_count<-spplot(mesa_all,"t_arrests_count", main=list(label=" Arrests ", cex=1),
col="transparent",col.regions = my.palette, cuts = 6)
t_perc_hispanic<-spplot(mesa_all,"t_perc_hispanic", main=list(label="Hispanic %", cex=1),
col="transparent",col.regions = my.palette, cuts = 6)
t_perc_black<-spplot(mesa_all,"t_perc_black", main=list(label=" Black % ", cex=1),
col="transparent",col.regions = my.palette, cuts = 6)
t_resid_high_per<-spplot(mesa_all,"t_resid_high_per", main=list(label="High Dens. Residential", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_dist_greenspace<-spplot(mesa_all,"t_dist_greenspace", main=list(label="Dist to Greenspace", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_dist_public_safety<-spplot(mesa_all,"t_dist_public_safety", main=list(label="Dist to Public Safety", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_median_income<-spplot(mesa_all,"t_median_income", main=list(label="Median Income", cex=1),
col="transparent", col.regions = my.palette, cuts = 3)
t_perc_owners<-spplot(mesa_all,"t_perc_owners", main=list(label=" Homeowner % ", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_commercial_per<-spplot(mesa_all,"t_commercial_per", main=list(label="Commercial %", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_dist_vacant_prop<-spplot(mesa_all,"t_dist_vacant_prop", main=list(label="Dist to Vacant Prop.", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
t_dist_graffiti<-spplot(mesa_all,"t_dist_graffiti", main=list(label="Dist to Graffiti", cex=1),
col="transparent", col.regions = my.palette, cuts = 6)
#t_Intercept<-spplot(mesa_all,"t_Intercept", main = "Intercept",
#col="transparent", col.regions = my.palette, cuts = 6)
grid.arrange(t_arrest_count, t_perc_hispanic, t_perc_black, t_resid_high_per, t_dist_greenspace,
t_dist_public_safety, t_median_income, t_perc_owners, t_commercial_per,
t_dist_vacant_prop, t_dist_graffiti,
ncol = 4, top = textGrob("Local t-values", gp=gpar(fontsize=20)))Discussion on the Modelling Results here - why we pick GWPR; how the city can target the west effectively based on the results.
Summary
Adams J. F. A. (1889), “Substitutes for opium in chronic diseases”, Boston Medical and Surgical Journal, 121, 15, 351-356.
Arizona Department of Health Services. (2022) Opioid Prevention [Online]. Available at: https://www.azdhs.gov/opioid/ (Accessed: 1 December 2022)
Brinkley-Rubinstein L., A. Macmadu, B. D. L. Marshall, A. Heise, S. I. Ranapurwala, J. D. Rich, T. C. Green (2018) “Risk of fentanyl-involved overdose among those with past year incarceration: Findings from a recent outbreak in 2014 and 2015”, Drug and Alcohol Dependence, 185 (2018), 189-191.
Burns J. M., R. F. Martyres, D. Clode and J. M. Boldero (2004) “Overdose in young people using heroin: Associations with mental health, prescription drug use and personal circumstances”, Medical Journal of Australia, 128, 7, 25-28.
Choi J. I., Lee J, Yeh A. B., Lan Q and Kang H, (2022) “Spatial clustering of heroin-related overdose incidents: a case study in Cincinnati, Ohio”, BMC Public Health, 22, 1253. https://doi.org/10.1186/s12889-022-13557-3
Draanen J. V., C. Tsang, S. Mitra, M. Karamouzian, and L. Richardson (2020), “Socioeconomic marginalisation and opioid-related overdose: A systematic review”, Drug and Alcohol Dependence, 214 (2020), 108-127.
Fotheringham, A.S., Brunsdon, C. and Charlton, M., (2003), “Geographically weighted regression: the analysis of spatially varying relationships”. John Wiley & Sons.
Gollini I., Lu B., Charlton M., Brunsdon C., and Harris P. (2013), “GWmodel: an R Package for Exploring Spatial Heterogeneity using Geographically Weighted Models”, arXiv, 2013. https://arxiv.org/abs/1306.0413
Gollini I., Charlton M., Brunsdon C., and Harris P. (2015), “GWmodel: an R Package for Exploring Spatial Heterogeneity using Geographically Weighted Models”, Journal of Statistical Software, 63(17).
Guy G. P., K. Zhang, M. K. Bohm, J. Losby, B. Lewis, R. Young, L. B. Murphy and D. Dowell (2017) “Vital signs: Changes in opioid prescribing in the United States, 2006-2015”, Morbidity and Mortality Weekly Report, 66, 26, 697-704.
Hollingsworth A., C. J. Ruhm and K. Simon (2017) “Macroeconomic conditions and opioid abuse”, Journal of Health Economics, 56 (2017), 223-233.
Kosten T. R. and T. P. George (2002) “The neurobiology of opioid dependence: Implications for treatment”, Science and Practice Perspectives, 1, 1, 13-20.
Krausz R. M., J. N. Westenberg and K. Ziafat (2021) “The opioid overdose crisis as a global health challenge”, Current Opinion in Psychiatry, 34, 4, 405-412.
Lippold K. M., Jones C. M., Olsen E. O. and Giroir B. P. (2019) “Racial/Ethnic and Age Group Differences in Opioid and Synthetic Opioid-Involved Overdose Deaths Among Adults Aged ≥18 Years in Metropolitan Areas - United States, 2015-2017”, MMWR Morb Mortal Wkly Rep, 68, 4, 967-73. doi: 10.15585/mmwr.mm6843a3.
Li Z. R., E. Xie, F. W. Crawford, J. L. Warren, K. McConnell, J. T. Copple, T. Johnson and G. S. Gonsalves (2019) “Suspected heroin-related overdose incidents in Cincinnati, Ohio: A spatiotemporal analysis”, PLoS Medicine, 16, 11, 1-15.
Lyden J. and I. A. Binswanger (2019) “The United States opioid epidemic”, Seminars in Perinatology, 43 (2019), 123-131.
Marshall J. R. S. F. Gassner, C. L. Anderson, R. J. Cooper, S. Lotfipour and B. Chakravarthy (2019) “Socioeconomic and geographical disparities in prescription and illicit opioid-related overdose deaths in Orange County, California, from 2010-2014”, Substance Abuse, 40, 1, 80-86.
Montandon G. and A. S. Slutsky (2019), “Solving the opioid crisis; Respiratory depression by opioids as critical end point”, Chest, 156, 4, 653-658.
Office of Governor Ducey. (2017) Governor Ducey Declares Statewide Health Emergency In Opioid Epidemic [Online]. Available at: https://azgovernor.gov/governor/news/2017/06/governor-ducey-declares-statewide-health-emergency-opioid-epidemic (Accessed: 1 December 2022)
Poliart, A., Kirakoya-Samadoulougou, F., Ouédraogo, M. (et al. (2021), “Using geographically weighted Poisson regression to examine the association between socioeconomic factors and hysterectomy incidence in Wallonia, Belgium.” BMC Women’s Health 21, 373.
Rosenblum A., L. A. Marsch, H. Joseph and R. K. Portenoy (2008), Opioids and the treatment of chronic pain: Controversies, current status and future directions”, Experimental and Clinical Psychopharmacology
Rosner B., J. Neicun, J. C. Yang and A. Roman-Urrestarazu (2019) “Opioid prescription patterns in Germany and the global opioid epidemic: Systematic review of available evidence”, PLoS One, 14, 8, 1-20.
Steif, Ken. (2021) Public Policy Analytics: Code & Context for Data Science in Government. [online]. Available at: https://urbanspatial.github.io/PublicPolicyAnalytics/index.html (Accessed: 1 December 2022)
Ward M. and K. Gleditsch (2008), Spatial Regression Models, SAGE Publications.
Waldhoer M., S. E. Bartlett and J. L. Whistler (2004) “Opioid receptors”, Annual Review of Biochemistry, 73 (2004), 953-990.
Zuur A. F., E. N. Leno and G. M. Smith (2007) Analysing Ecological Data: Statistics for Biology and Health, Springer: New York.